1-introduction-cover_1 Franck V. on Unsplash

Introduction

An ‘Artificial Neural Network’ (ANN), also referred to as a ‘Neural Net’, is a collection of simple processing units that are interconnected and, in this combined structure, has a much greater capacity to perform complex functions than the sum of each (disconnected) could achieve. Figure 1 presents the concept of a basic neural net.

Figure 1: The basic neural net

The formation of the Neural Net was inspired by the form and operation of the human brain. The brain consists of around 100 billion biological neurons, each of which acts as a simple processing unit. These biological neurons are connected to form the complex network known as our nervous system. In the brain, the neuron is a living cell (often referred to as the ‘brain cell’) and the networked interconnections between neurons are formed by biological structures.

Before proceeding it should be noted that the Artificial Neural Network was inspired by the human brain but were not an attempt to accurately replicate it, i.e., the construction of early ANNs were based on a simplified conception of the brain. Today we know more about the complex structures and functions of our brains and we can see a myriad of ways in which the brain differs from an ANN. However, that being said, it is still of some benefit to review the simplified model of the brain that was the inspiration of early ANN design.

The biological neuron

Figure 2 presents a simple representation of a basic biological neuron:

Figure 2: The biological neuron
Nicolas.Rougier / CC BY-SA

In the biological neuron, the dendrites act to receive signals passed on from other neurons or from the outside world, as is the case for sensory neurons. In the cell body (in particular the nucleus) the signal is processed. This processing determines how the neuron will act in accordance with what input signals it has received. The axon then transmits the output from the cell body to the synapses so that it can be communicated to other neurons (or muscles / glands in the case of motor neurons). Information is transmitted in the form of ‘Action Potentials’, which travel down the axon to the synapses which in turn act as the outgoing interface.

All of the functions of a human being: how we gather information via our senses from the environment, process that information as conscious or subconscious thought, and subsequently act on those thoughts, are governed by the complex network of neurons in our nervous system.

Sensory neurons convert sense data (such as the sensation of heat) to action potentials. Action potentials can be thought of as a biological encoding of that data and these action potentials are transmitted from sensory neurons to our brain via the central nervous system.
Interneurons process this encoded data in our brains; some of this processing may enter our consciousness in the form of a thought and some may be processed sub-consciously. Interneurons subsequently produce new action potentials that govern the intended action.
These new action potentials are transmitted from the brain to motor neurons, again via the central nervous system. The motor neurons then stimulate muscles to contract or relax to control how the body moves and interacts with the world around us.

Remember that this conception of how information is processed through the nervous system has been vastly simplified; it is, however, this basic conceptualisation (the networked connection of simple processing units) that inspired the artificial neural network. Before we move on to the artificial neural network, it is worth looking more closely at how the neuron was conceptualised in this model.

How the neuron operates

Having seen how the brain works as a functional network of interconnecting neurons, it is also worth considering how the individual neuron works as this has also inspired ANN design. At the simplest level, neurons, when stimulated, produce pulses called action potentials. A neuron can become stimulated in response to an accumulation of chemical signals at their dendrites. These signals may be caused by pulses generated by neighbouring neurons or by some environmental influence in the case of sensory neurons.

Before being stimulated a neuron is said to be polarised. In this state, the cell body of the neuron is primed and ready to discharge its action potential after it receives a suitable stimulus. Each cell body has associated with it a level of stimulus, which when exceeded causes the neuron to fire, i.e., to produce its own action potential. A neuron might receive stimulation from either one or several sources, but only once enough cumulative stimulation is detected will it initiate its pulse.

Note that while the action potential is an electrical signal in the axon, it does not work in precisely the same way as the flow of electrons through a wire. The electrical pulse of an action potential is transmitted along the axon via an exchange of ions and as a result, it travels at a relatively slow pace compared to a regular flow of electrons (circa. some hundreds of meters per second). Regardless of the differences, the pulse can still be measured as a potential difference (change in voltage) that travels down the axon of the neuron. Its profile takes the following approximate form:

Figure 3: The voltage profile of an action potential

Due to the short duration of these action potentials, they are often represented as simple voltage spikes. A strongly stimulated neuron produces several short successive pulses and a weakly stimulated neuron produces less:

strong and weakly stimulated neurons Figure 4: Strongly / weakly stimulated neurons

Now that we know how the neuron functions in the brain, it is worth considering how the brain learns.

How the brain learns

In the previous section, we described the process by which action potentials are produced at the cell body of the neuron and conducted down the axon towards the synapses. The synapse is the interface between a neuron and its neighbouring neurons in the network. The synapse is an important component to the system of information transmission through the neural network and it influences how the neural network ‘learns’. At the end of the axon is a tiny bulb called the ‘Synaptic Bulb’ and this is separated from the dendrites of a neighbouring neuron by a microscopic (nanometer wide) gap called the ‘Synaptic Cleft’ as illustrated:

Figure 5: The Synapse

When an action potential reaches the end of the axon, it stimulates the release of chemicals called neurotransmitters, which are held in the synaptic bulb. The neurotransmitters cross the synaptic cleft and stimulate the dendrite of the next neuron. One mode in which a network could learn was proposed by Donald Hebb in 1949. Hebb postulated that if a synapse is activated often, its chemical pathway gets ‘strengthened’, inferring that that upon regular stimulation the synaptic cleft became conditioned and would release more neurotransmitter than it would have if unconditioned. This conditioning would cause that particular pathway through the network to become ‘stronger’, while lesser used pathways would remain comparatively ‘weaker’. Whilst this may once again be a simplification of the learning mechanism of the brain, it informed the learning mechanisms that were initially implemented in artificial neural networks.

We shall now move on to explore how early artificial neural networks were designed, mimicking some aspects of the human brain.

The artificial neural network

The first model of an artificial neuron was proposed by Warren McCulloch and Walter Pitts in 1943 and as such was named the McCulloch-Pitts, ‘MP’ neuron. In this model, various binary inputs are accepted by the neuron, which produced a binary output if the sum of the inputs exceeded a threshold value that would be set by the programmer. The MP neuron was later improved upon by Frank Rosenblatt in 1957, who named his new artificial neuron the ‘Perceptron’. The perceptron model of the artificial neuron accepted non-Boolean input values, imposed a weighting factor on each input and then summed the weighted inputs. This summation of weighted input values was then compared against a threshold value and once again, if the threshold was exceeded, the Perceptron model output a Boolean value of 1, otherwise 0.

It was later shown that by connecting several Perceptrons into a network, this emerging system could be used to accomplish very complex functions, and thus the Artificial Neural Network was devised.

The Artificial Neuron

The following diagram illustrates the Perceptron model of the Artificial Neuron:

Figure 6: The Perceptron model of the Artificial Neuron

Chrislb / CC BY-SA

By this notation:

the inputs to the Perceptron are represented by ‘x_’. In the biological sense, these inputs correspond to the activity detected at the dendrite of the neuron.
each input is weighted by a factor ‘w__j’. In the biological sense, this weight corresponds to the strength of the synaptic connection.
the sum of these inputs and their weights ‘net_j’ is the activation level of the Perceptron. net_j is calculated as:

${n e t}_{j} = \sum_{1}^{n} (x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j} + ... + x_{n} w_{n j})$

the activation level of the Perceptron is compared against the threshold θj to determine the binary output Oj as follows:

$if {n e t}_{j} \geq θ_{j} then O_{j} = 1$

$if {n e t}_{j} < θ_{j} then O_{j} = 0$

In summary, the Perceptron takes its inputs, weights them according to how ‘strong’ the connection is and if the sum of all weighted inputs is above a certain threshold, the neuron ‘fires’. Later in these lessons, we shall explore how a neuron learns by changing its weights.

The Perceptron model and the biological neuron differ in one important aspect. The output from the Perceptron was binary (1 or 0) unlike the biological neurons, which could be strongly or weakly stimulated capable of outputting an analogue degree of activation. In further iterations of the artificial neuron, this difference was overcome by replacing the binary threshold comparison with an analogue ‘Activation Function’.

The activation function

The purpose of the activation function is to determine the output of the artificial neuron based upon its inputs. The threshold function (aka. the hard limit function) is the most basic form of an activation function. The threshold function is mathematically and graphically expressed as follows:

The Threshold Activation Function

With input ( ${n e t}_{j}$ ) and threshold set to value ( $θ_{j}$ ); the output (ϕ) is defined by:

$φ^{threshold} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

Figure 7: Threshold activation function

Over time, it was found that the binary nature of the threshold function made it difficult for early ANNs to learn, as small variations in input were not being captured in the output. In many applications, these variations may have been small but they were significant.

The following piecewise linear function (‘PWL’) was devised to provide a continuous characteristic in an attempt to overcome this issue.

The Piecewise Linear Function

With input ( ${n e t}_{j}$ ) and setpoints placed at ( $θ_{j :} \pm 0.5$ ); the output (ϕ) is defined by:

$φ^{p w l} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq + 0.5 \\ {n e t}_{j} & + \frac{1}{2} & for - 0.5 < {n e t}_{j} < + 0.5 \\ 0 & for {n e t}_{j} < - 0.5 \end{matrix}$

Note that $θ_{j}$ may be set at value, $\pm 0.5$ being selected here for illustrative purposes.

Figure 8: Threshold activation function

The most widely used activation function is the sigmoid function. Similar to the piecewise linear function, the sigmoid function avoids the pitfalls of a binary activation function by providing a continuous output characteristic and further improves upon the PWL by smoothing the output around arbitrarily selected setpoints and by extending the output’s responsiveness to inputs beyond the setpoints of the piecewise linear function.

The Sigmoid Function

With input ( ${n e t}_{j}$ ) and slope parameter (α), the output (ϕ) is defined by:

$φ^{s i g} ({n e t}_{j}, α) = \frac{1}{1 + exp (- α \cdot {n e t}_{j})}$

Figure 9: Threshold activation function

Whilst the sigmoid function is the most widely used activation function today, it is still common to find any of the above functions (amongst several others) still applied in neural networks. Often the selection of the activation function is determined by the specific application in which the neural network is working.

Study exercise 1

The Perceptron model of an artificial neuron is illustrated below. Calculate the output (O_j) from the Perceptron, given the following input and weight parameters:

Inputs	Value	Weights	Value
x1	0.1	w1j	0.8
x2	0.5	w2j	0.2
x3	0.3	w3j	0.5
Activation Function: $φ^{threshold} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$ Assume threshold ( $\begin{matrix} θ_{j} \end{matrix}$ ) = 0.2.

Figure 10: Perceptron - Study exercise 1

Study exercise 1 - Answer

The output from the summation block (net_j) is calculated as follows:

${n e t}_{j} = \sum_{1}^{n} (x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j} + ... + x_{n} w_{n j})$

${n e t}_{j} = x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j}$

${n e t}_{j} = (0.1 \times 0.8) + (0.5 \times 0.2) + (0.3 \times 0.5) = 0.33$

The output from the activation function (O_j) is calculated as follows:

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (0.33 0.2) = 1$

Study exercise 2

Repeat Study Exercise 1, in this case assuming that the artificial neuron has a sigmoid output function governed by the following parameters:

Inputs	Value	Weights	Value
x1	0.1	w1j	0.8
x2	0.5	w2j	0.2
x3	0.3	w3j	0.5
Activation Function: $φ^{s i g} ({n e t}_{j}, α) = \frac{1}{1 + exp (- α \cdot {n e t}_{j})}$ assume slope constant (α) = 1.

Figure 11: Perceptron - Study exercise 2

Study exercise 2 - Answer

The output from the summation block (net_j) is calculated as follows:

${n e t}_{j} = \sum_{1}^{n} (x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j} + ... + x_{n} w_{n j})$

${n e t}_{j} = x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j}$

${n e t}_{j} = (0.1 \times 0.8) + (0.5 \times 0.2) + (0.3 \times 0.5) = 0.33$

The output from the activation function (O_j) is calculated as follows:

$O_{j} = φ^{s i g} ({n e t}_{j}, α) = \frac{1}{1 + exp (- α \cdot {n e t}_{j})}$

$O_{j} = φ^{s i g} (0.33, 1) = \frac{1}{1 + exp (- 0.33)} = 0.582$

How an Artificial Neuron detects patterns

A common practical application of artificial neural networks is pattern detection. To demonstrate this quality, we shall consider a simple pattern recognition using just one artificial neuron with four inputs. The following pattern can be represented as in digital form as illustrated below:

Pattern

Assigning Digital Values to the Pattern

Converting the Pattern to String of Values

x1	x2	x3	x4
1	0	0	1

The above processed image can then be fed into an artificial neural network that’s weights have been selected, (aka. ‘programmed’) to detect this image, where the Perceptron outputs [1] if the pattern is detected and otherwise [0]:

Programmed Parameter	Value
w1j	1
w2j	0
w3j	0
w4j	1
θ	1.5

Figure 12: Simple Pattern Recognition with a Perceptron

Processing this Perceptron is as follows:

The output from the summation block (net_j) is calculated as follows:

${n e t}_{j} = \sum_{1}^{n} (x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j} + ... + x_{n} w_{n j})$

${n e t}_{j} = x_{1} w_{1 j} + x_{2} w_{2 j} + x_{3} w_{3 j} + x_{4} w_{4 j}$

${n e t}_{j} = (1 \times 1) + (0 \times 0) + (0 \times 0) + (1 \times 1) = 2$

The output from the activation function (O_j) is calculated as follows:

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (2 1.5) = 1$

Therefore, the pattern has been detected

We can feed any other pattern of 2 black / 2 white squares into the Perceptron and find that the unit detects no pattern in each case:

0110

${n e t}_{j} = (0 \times 1) + (1 \times 0) + (1 \times 0) + (0 \times 1) = 0$

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (0 1.5) = 0$

NO pattern detected

0101

${n e t}_{j} = (0 \times 1) + (1 \times 0) + (0 \times 0) + (1 \times 1) = 1$

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (1 1.5) = 0$

NO pattern detected

0011

${n e t}_{j} = (0 \times 1) + (0 \times 0) + (1 \times 0) + (1 \times 1) = 1$

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (1 1.5) = 0$

NO pattern detected

Noise tolerance in pattern recognition

It should be noted that even if there were some ‘interference’ to the correct basic pattern (some of the white pixels weren’t quite [0] and some of the black pixels weren’t quite [1]) the Perceptron programmed above would still be capable of correctly recognising the pattern:

pattern detected

${n e t}_{j} = (0.9 \times 1) + (0.1 \times 0) + (0.1 \times 0) + (0.9 \times 1) = 1.8$

$O_{j} = φ^{t h r e s h o l d} ({n e t}_{j}, θ_{j}) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ_{j} \\ 0 & for {n e t}_{j} < θ_{j} \end{matrix}$

$O_{j} = φ^{t h r e s h o l d} (1.8 1.5) = 1$

Pattern detected

This attribute is termed ‘noise tolerance’, which is an important and potentially very useful capability of an artificial neural network. However, it should be immediately clear that the degree to which an ANN is noise tolerant needs careful consideration to ensure that it is not so noise tolerant that it starts to produce false positives in pattern recognition or any other application.

Teaching an Artificial Neuron to learn

In the previous example of pattern recognition, the weights were set by hand, calibrated to strengthen the Perceptron’s throughput when the correct pattern is input and diminish the throughput for when an incorrect pattern is input. This may be acceptable for single-neuron applications with simple patterns to work with, however, with increased complexity, calibration by hand would quickly become impractical. One of the benefits of ANNs is that they can be trained which weights are required, removing the need for calibration by hand.

When introducing the Perceptron model, it was stated that the Artificial Neuron ‘learns’ by setting its own weights. The basic process of learning can be implemented by the following process:

Initialise the neuron with random weighting factors
Feed the neuron a known input pattern (where the expected output is also known).
Calculate the actual output from the neuron.
Compare the expected output to the actual output and assess how successfully the neuron has behaved.
1. If the output is too high, reduce the weights associated with high inputs.
2. If the output is too low, increase the weights associated with low inputs.
3. If the output is correct, do not change the weights.

After some period of training has been completed, the weights of the network should settle around some final values. We may then initialise the neuron with these trained weighting factors. The user should now be able to feed the neuron an unknown input pattern and the neuron should be able to correctly identify it, i.e., the network has learned to identify that pattern through training.

This is a basic description of how an artificial neural network can learn. In practical applications, training is often more complex, and it is also important to consider some additional aspects of the network:

Which learning algorithm is used to adjust weights?
Determining the size and structure of multilayer networks.
Do the inputs need to be pre-processed into a form that the ANN is designed to accept?
How much training is required? How does the network operate if undertrained? Could the network be over-trained?
How accurate or noise tolerant do we want the network to be?

Many of these topics will be covered in future sections.

Summary

In this section, we described the basic artificial neuron; its basic structure, how it processes input data, its capability for pattern recognition and its ability to learn. In the next section, we shall be looking at some of the limitations of the artificial neuron and how these can be overcome by creating networks of interconnected neurons: the artificial neural network. Traditionally, artificial neural networks have been used for 4 main functions, (although several more functions are possible):

Classification – the ability to classify a given pattern/input dataset into a grouping or class.
Prediction – the ability to predict the output of another system (such as the stock market behaviour) based on a set of inputs (such as stock prices and trends).
Clustering – the ability to identify unique features of a dataset (often features unknown even to experts in those fields) and to then classify those datasets based on these features.
Association – the ability to compare datasets and associate common aspects from one dataset to the other.

Since their first conception in the 1950s, artificial neural networks have been used for a wide range of applications including:

Speech and voice recognition
Financial forecasting
Machine diagnostics for condition monitoring
Medical diagnosis
Intelligent search engine optimisation

Introduction to Artificial Neural Networks

Introduction

The biological neuron

How the neuron operates

How the brain learns

The artificial neural network

The Artificial Neuron

The activation function

Study exercise 1

Study exercise 1 - Answer

Study exercise 2

Study exercise 2 - Answer

How an Artificial Neuron detects patterns

Noise tolerance in pattern recognition

Teaching an Artificial Neuron to learn

Summary